Approximate Boyer-Moore String Matching
نویسندگان
چکیده
The Boyer-Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string (length m) in a text string (length n) with at most k mismatches. Our generalized Boyer-Moore algorithm is shown (under a mild independence assumption) to solve the problem in expected time O(kn( 1 m – k + k c )) where c is the size of the alphabet. A related algorithm is developed for the k differences problem where the task is to find all approximate occurrences of a pattern in a text with ≤ k differences (insertions, deletions, changes). Experimental evaluation of the algorithms is reported showing that the new algorithms are often significantly faster than the old ones. Both algorithms are functionally equivalent with the Horspool version of the Boyer-Moore algorithm when k = 0.
منابع مشابه
The Filtering Approaches for the Improved Boyer-Moore Approximate String Matching
The Boyer-Moore algorithm is to solve exact string matching. Here, the Bad Character Rule of the Boyer-Moore algorithm is extended to solve approximate string matching. Although Tarhio and Ukkonen introduce a basic algorithm, it is similar to the Horsool algorithm. We utilize the concept of their algorithm to implement the Bad Character Rule, and we will obtain a new shift length. When the wind...
متن کاملApproximate String Matching with Reduced Alphabet
We present a method to speed up approximate string matching by mapping the factual alphabet to a smaller alphabet. We apply the alphabet reduction scheme to a tuned version of the approximate Boyer– Moore algorithm utilizing the Four-Russians technique. Our experiments show that the alphabet reduction makes the algorithm faster. Especially in the k-mismatch case, the new variation is faster tha...
متن کاملString Matching Rules Used by Variants of Boyer-moore Algorithm
String matching problem is widely studied problem in computer science, mainly due to its large applications used in various fields. In this regards many string matching algorithms have been proposed. Boyer-Moore is most popular algorithm. Hence, maximum variants are proposed from Boyer-Moore (BM) algorithm. This paper addresses the variant of Boyer-Moore algorithm for finding the occurrences of...
متن کاملExact pattern matching: Adapting the Boyer-Moore algorithm for DNA searches
Exact pattern matching aims to locate all occurrences of a pattern in a text. Many algorithms have been proposed, but two algorithms, the Knuth-Morris-Pratt (KMP) and the Boyer-Moore (BM), are most widespread. It is the basis of some approximate string matching algorithms like BLAST, and in many cases it is desirable to locate an exact rather than approximate matches. Although several studies i...
متن کاملBoyer-Moore Strategy to Efficient Approximate String Matching
We propose a simple but eecient algorithm for searching all occurrences of a pattern or a class of patterns (length m) in a text (length n) with at most k mismatches. This algorithm relies on the Shift-Add algorithm of Baeza-Yates and Gonnet 6], which involves representing by a bit number the current state of the search and uses the ability of programming languages to handle bit words. State re...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- SIAM J. Comput.
دوره 22 شماره
صفحات -
تاریخ انتشار 1993